Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove mmap backend #536

Merged
merged 4 commits into from
Mar 24, 2023
Merged

Remove mmap backend #536

merged 4 commits into from
Mar 24, 2023

Conversation

cberner
Copy link
Owner

@cberner cberner commented Mar 24, 2023

Fixes #458

I decided that the maintenance burden of supporting the mmap backend is too high for too small of a performance improvement. This will make redb provably safe and trivial to port to other platforms. Proving that the mmap backend was safe seemed infeasible, and this greatly simplifies the codebase.

cberner added 4 commits March 24, 2023 11:41
The userspace cache backend is now within a factor of about 1.5-2x, and
I've given up on proving that the mmap backend is sound. It's just too
complex and there was already a soundness bug found in 1293d4f
@cberner cberner merged commit 619aa90 into master Mar 24, 2023
@cberner cberner deleted the mmap branch March 24, 2023 21:21
@Kerollmops
Copy link
Contributor

Hey @cberner 👋

I was wondering if the removal of the mmap backend means that the values are always allocated when retrieved? Which means that keeping too many values for too long can result in OOM?

@cberner
Copy link
Owner Author

cberner commented Aug 21, 2024

No, it should not. There's a cache which holds pages (redb's Page abstraction, not OS pages) and they're evicted when the cache is full. The size is configurable

@Kerollmops
Copy link
Contributor

Kerollmops commented Aug 24, 2024

Thank you very much for the answer, @cberner. I have a couple of other questions:

  • What if I keep a lot of AccessGuards for a long time? I get them from the <Table as ReadableTable>::get method.
  • Will the pages be kept for as long as I keep the guards, and therefore, it can blow up the memory?
  • Or will the page be retrieved lazily when I call AccessGuard::value?
  • And if it's the latter, I could keep the &[u8] it returns and keep them for a long time. What happens?

I am just thinking about this because I use LMDB and extensively use memory mapping behavior in arroy by keeping pointers to some values and read them in parallel.

@cberner
Copy link
Owner Author

cberner commented Aug 24, 2024

Yes, the pages will be kept for the lifetime of the AccessGuard. If the values are small, you could copy them out of the AccessGuard and retain them for a long-time. Alternately, you can keep the key around, and retrieve the value from the Table on-demand, which will read from the cache if the value is cached and read from disk if it has been evicted

@Kerollmops
Copy link
Contributor

Kerollmops commented Aug 24, 2024

Alternately, you can keep the key around, and retrieve the value from the Table on-demand, which will read from the cache if the value is cached and read from disk if it has been evicted

Thank you, @cberner; it seems to be the best alternative. My last question is: Is there a way to read a non-committed transaction in parallel?

Like reading the current transaction with multiple read transactions so that I can spawn multiple threads with each transaction and read the table in each (making sure no writes are performed while these read transactions live).

table.write(wtxn, "abc", "long embedding")?;
table.write(wtxn, "def", "another long embedding")?;

let rtxn_gen = wtxn.lock_for_read()?;
let rtxn0 = rtxn_gen.read_txn()?;
let rtxn1 = rtxn_gen.read_txn()?;

// not allowed while rtxn_gen lives
// table.write(wtxn, "def", "another long embedding")?;

assert_eq!(table.read(rtxn0, "abc")?, "long embedding");
assert_eq!(table.read(rtxn1, "def")?, "another long embedding");

rtxn0.abort();
rtxn1.abort();
drop(rtxn_gen);

table.write(wtxn, "ghi", "another super long embedding")?;

@cberner
Copy link
Owner Author

cberner commented Aug 24, 2024

Read transactions can only read committed data, so you can't directly do that. The two options would be:

  1. Tables implement Send so you can perform multithreaded reads of the Table during the write transaction. For example, see this Add test for multithreaded reads of a Table #848
  2. commit (you can use a Durability::None commit, if you're concerned about latency) and then read

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Audit and document all unsafe usage
2 participants